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Abstract 

Background: There are a growing number of genomes sequenced with tentative functions assigned to a large 
proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of 
gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by 
investigating expressed genes of sockeye salmon {Oncorhynchus nerka) muscle tissue. We compared morphology 
and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously 
documented divergent morphology, feeding strategy, and predation in association with these distinct 
environments, we expect that burst swimming is favored in riverine population and continuous swimming is 
favored in lake-type population. In turn we predict that morphology and expressed genes promote burst 
swimming in riverine sockeye and continuous swimming in lake-type sockeye. 

Results: We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, 
streamlined bodies. Gene expression patterns were measured using a 16K microarray, discovering 141 genes with 
significant differential expression. Overall, the identity and function of these genes was consistent with our 
hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes 
found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for 
the lake-type population. 

Conclusions: This study provides a framework for understanding sockeye life history from a transcriptomic 
perspective and a starting point for more extensive, targeted studies determining the ecological context of genes. 



Background 

The field of genomics is expanding rapidly with full gen- 
ome and transcriptome sequencing of many model and 
non-model species. Annotating these genomes continues 
to pose a challenge [1]. Due to sequence conservation of 
functional genes and the rapidly growing molecular 
knowledge of model organisms, basic local alignment 
search tools (e.g. BLAST) facilitate the initial annotation of 
non-model genomes [2] . However, the ecological context 
of genes largely remains a mystery; nearly all gene annota- 
tion is based on studies of few model organisms in labora- 
tory environments [3-6]. Thus, genes that function 
primarily in natural settings remain unannotated, and 
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Other genes with known function in laboratory organisms 
have no ecological context. 

Studies of fishes are leading the way in providing an 
ecological context to genomes [e.g., lake whitefish [Core- 
gonus clupeaformis [7,8]], Atlantic salmon [Salmo salar 
[9-11]], killifish [Fundulus heteroclitus [12-14]], three- 
spine stickleback [Gasterosteus aculeatus [15,16]] and 
sockeye salmon [Oncorhynchus nerka [17-19]]]. These 
studies have employed three basic methods to relate gene 
transcription to ecological systems [20]. First is in situ 
gene expression analysis [e.g. [17-19]]. Sampling occurs 
in the ecological context of interest in nature; fish cap- 
ture and RNA preservation occur in the field. This 
method measures both genetic and environmental effects 
on the transcriptome and it is often not possible to assign 
gene expression differences to either source. A study 
design with replication reporting parallel expression dif- 
ferences between two systems reduces population and 
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perhaps environment specific gene expression [e.g. [7]]. 
In general, this method is applicable to many species 
including large or long-lived species where laboratory 
rearing or genetic crosses are not practical. Second, one 
can remove the natural environmental effect and only 
test for genetic effects on gene expression in common 
garden experiments. This strategy compares transcrip- 
tomes of genetically distinct ecotypes in controlled condi- 
tions [e.g. [21]] and is generally applied to species that 
can be reared artificially. Reaction norms may be tested 
by experimentally manipulating conditions. Third, one 
can perform expression quantitative trait loci (eQTL) 
analyses by crossing genetically distinct ecotypes in a 
laboratory setting and mapping gene expression pheno- 
types to linkage groups [e.g. [22,23]]. This method 
requires artificial rearing and is only practical for species 
with short generation times; however, this is also the only 
method of these three able to determine the overall geno- 
mic architecture of gene expression [20]. Unfortunately, 
the latter two methods are removed from natural envir- 
onment variation, and therefore may miss heritable 
expression that requires certain environmental condi- 
tions to manifest. 

Juvenile sockeye salmon exhibit a life-history dichotomy 
in their freshwater rearing environments; lake-type popu- 
lations rear in lakes for one to two years before travelling 
to the ocean to feed whereas riverine populations rear in 
river habitats for up to two years [in the riverine subset 
"sea-type", individuals go to sea before the first winter, 
whereas "river-type" spend at least one winter in the river- 
ine habitat; [24,25]]. Foraging, water current, and preda- 
tion differ between habitats [26,27]. Body shape differs 
between these life history types in association with the 
environment. In southwest Alaska, riverine sockeye exhibit 
a deep robust body whereas lake-type sockeye are more 
fusiforme [27]. This may be the result of both predation 
regime and a foraging strategy favoring burst swimming in 
riverine and continuous swimming in lake-type habitats. 
Similar morphological and behavioral differences are 
apparent within and among different species of Pacific sal- 
mon [28-31]. 

A set of recent studies characterized the transcriptome in 
ecotypes of another salmonid, the lake whitefish, employ- 
ing both the in situ and common garden approaches in 
dwarf and normal ecotypes [7,21]. The primary ecological 
trade-off between an increase in growth and fecundity in 
the normal ecotype is the increased energetics in the dwarf 
ecotype [7,32,33]. The dwarf ecotype exhibits continuous 
swimming for feeding and is subjected to high predation 
compared to the normal ecotype [7,33]. Therefore, both 
continuous swimming during plankton foraging and burst 
swimming during predator avoidance are likely favored in 
the dwarf ecotype, resulting in an energy expenditure for 
metabolism at the expense of growth [32,34]. This trade- 



off results in great differences in growth rate, age at matur- 
ity, body shape, and maximum lifespan. 

Phenotypically, lake whitefish ecotypes have drastically 
different sizes at the same age [34]. Although the size dis- 
tributions of the sockeye populations in this study overlap 
at the juvenile life stage, the riverine sockeye are longer 
and have a more robust body shape [27]. With less 
extreme morphological differences in ecotypes of sockeye 
salmon juveniles, we expect the molecular trade-offs to be 
different from the lake whitefish studies. We expect genes 
differentially expressed to reflect the differing emphasis on 
continuous swimming for lake-type and burst swimming 
for riverine. 

In this study, we compare the body morphology and in 
situ transcriptome of two sockeye salmon populations in 
the same drainage that exhibit these divergent life his- 
tories. Differences in foraging strategy and predation may 
have led to genetic differences between these populations 
[35] . We expect the transcriptome to reflect the functional 
molecular trade-offs driven by the ecological differences in 
these life histories. A greater understanding of the molecu- 
lar mechanisms that relate to functional ecology will 
enhance our understanding of the phenotypic diversity of 
this species, as well as place specific gene annotations into 
an ecological context. 

Results 

Morphology 

Albert Johnson Creek (AJC) juvenile sockeye were differ- 
ently shaped compared to the Surprise Lake (SL) popula- 
tion, indicated in the significant population term of our 
model (df = 19/348; F = 6.10; P < 0.001). Generally, SL 
juveniles were streamlined compared with the robust 
shape of AJC juveniles (Figure 1). The interaction term 
centroid size x population was not significant and there- 
fore not included in the final model. 

Microarray expression profiles 

The microarray analysis indicates 141 transcripts with sig- 
nificant differential expression (average fold change (FC) > 
1.5) between individuals from the riverine and lake popu- 
lations (t-test with Benjamini Hochberg FDR multiple test 
correction [MTC]; P < 0.05). Of these, 81 were over- 
expressed in AJC compared with SL (Table 1) while 60 
were over-expressed in SL compared with AJC (Table 2). 
The fold differences were modest, most of which were 
below two-fold. In AJC, the genes with the highest over- 
expression with corrected P values were: type II keratin El 
(Genbank:CB510619; FC 2.1; P = 0.033), kinesin-like pro- 
tein (Genbank:CB491150; FC 2.1 P = 0.033), and CCAAT/ 
enhancer-binding protein (Genbank:CA050914; FC 1.9; 
P = 0.044). In SL the genes with the highest over-expres- 
sion were: structural maintenance of chromosomes pro- 
tein IB (Genbank:CB488712; FC 2.9; P = 0.024), CD81 
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Surprise Lake 



Figure 1 Landmarks used and deformation grids of Albert Jolinson Creels (AJC) and Surprise Lal<e (SL) fish The middle panel depicts 
the locations of the twelve landmarks used in the geometric morphometric comparison of body shape, the top and bottom panels depect the 
details of the shape differences of these populations from ecolgically different habitat between AJC (top) and SL (bottom). 



antigen (Genbank:CA039936; FC 2.76; P = 0.033), collagen 
alpha-2(I) chain precursor (Genbank:CB515159; FC 2.36, 
P = 0.028), ferritin, heavy subunit (Genbank:CB505886; FC 
2.33, P = 0.044), and troponin I, slow skeletal muscle 
(Genbank:CB509964; FC 2.29; P = 0.039). Overall, the 
cGRASP annotation file matched our checking of EST 
sequences in Megablast and tBLASTx quite well, but 6 of 
99 genes from the differentially expressed lists with dis- 
tinct descriptions (not containing words like "unknown" 



or "predicted") were found to have different descriptions 
(Additional File 1, Table SI), many of which have been 
submitted to NCBI databases recently. 

In order to expand differential expression lists to facil- 
itate functional analysis, the stringent multiple test cor- 
rection was removed during significance testing, and 
transcripts that showed any expression difference were 
included in this analysis {P < 0.05). As a result, 1026 
genes were found significantly differentially expressed in 



Table 1 Genes significantly over-expressed in Albert Johnson Creek (AJC) sockeye salmon muscle compared to Surprise Lake (SL) 



Function 


Fold Change (Up in 
AJC) 


P Value 
Raw 


P Value 
MTC 


Normalized 
AJC 


StdDev 
AJC 


Cell division 


1.518 


0.003 


0.033 


1.160 


0.447 


Cell division 


1.678 


0.008 


0.043 


1.504 


0.936 


Cell division 


1.778 


0.011 


0.044 


1.592 


0.849 


Cell structure 


2.118 


0.003 


0.033 


1.422 


1.252 


Development 


1.644 


0.002 


0.033 


1.162 


0439 


Development 


1.755 


0.004 


0.036 


1.122 


0.516 


DNA Replication 


1.738 


0.007 


0.042 


1.107 


0.631 


UNA nepiication 


1 ./zz 


U.UUo 






U.003 


Energetic 


1.601 


0.014 


0.050 


1.287 


0.662 


Metabolism 












Energetic 


1.796 


0.013 


0.048 


1.262 


0.885 


Metabolism 












Energetic 


1.685 


0.007 


0.042 


1.552 


0.764 


Metabolism 












Immune 


\ .jy/ 


n nno 
u.uuy 




1 1 

1 . 1 OZ 


U./ jZ 


Immune 


1.893 


0.002 


0.033 


1.293 


0.663 


Immune 


1.722 


0.003 


0.033 


1.096 


0.548 


Iron Binding 


1.540 


0.012 


0.045 


1.134 


0.400 


Iron Binding 


1.657 


0.004 


0.033 


0.981 


0.449 


Mitochondria 


1.873 


0.007 


0.043 


1.428 


0.953 


Mitochondria 


1.528 


0.001 


0.033 


1.092 


0.365 


Protein folding 


1.644 


0.006 


0.041 


1.210 


0.548 


Protein transport 


1.911 


0.003 


0.044 


1.095 


0.697 


Protein transport 


1.674 


0.006 


0.041 


1.237 


0.702 


Protein transport 


2.103 


0.001 


0.033 


1.263 


0.669 


Signal 


1 ^AA 


n nn^ 


n^^ 


1 1 7 A 
. 1 / ^ 


U.J 1 u 


Transcription factor 


1 

.UJZ 


n ni 1 
u.u 1 1 


U.U44 




n A QO 


(neg) 












Transcription 


1.974 


0.009 


0.044 


1.335 


1.062 


regulation 












Transcription 


1.903 


0.002 


0.033 


1.210 


0.667 


regulation 












Translation 


1.518 


0.001 


0.033 


1.067 


0.301 


Translation 


1.967 


0.003 


0.033 


1.196 


0.714 


Translation 


1.528 


0.001 


0.033 


1.297 


0.504 


Translation 


1.613 


0.001 


0.033 


1.252 


0.544 



Normalized StdDev Genbank Description 



SL SL 

0.765 0.261 CA059189 Golgi reassembly-stacking protein 2 

0.896 0.389 CA060603 Katanin p80 WD40-containing subunit Bl 

0.895 0.593 CB515443 Nuclear autoantigenic sperm protein 

0.671 0.361 CB510619 Oncorhynchus mykiss mRNA for type II keratin El (El gene) 

0.707 0.297 CA052389 Plexin-A3 precursor 

0.639 0.354 CB498195 Reticulon-4 

0.637 0.335 CA053921 Girdin 

0.892 0.375 CB494447 Poly [ADP-ribose] polymerase 14 

0.804 0413 CA042459 Salmo salar Na, K-ATPase alpha subunit isoform 1 b/ii 

(ATPlAlB/ii) 

0.703 0.436 CA064415 Methionine-R-sulfoxide reductase 

0.921 0.482 CB503498 Creatine kinase, testis isozyme 

0.740 0.256 CA048654 Oncorhynchus mykiss SYPG1 MHCII-alpha and Raftlin-like 

pseudogenes 

0.683 0.392 CA063034 Transient receptor potential cation channel subfamily M 

member 4 

0.636 0.293 CB492871 Oncorhynchus mykiss mRNA for MHC class II alpha (onmy- 

DAA*02 gene) 

0.736 0.372 CB494485 Ferritin, heavy subunit 

0.592 0.263 CB501208 Hemoglobin subunit alpha-4 

0.763 0472 CB489874 TIM21-like protein, mitochondrial precursor 

0.714 0.217 CB498084 NADH dehydrogenase [ubiquinone] 1 alpha subcomplex 

subunit 4 

0.736 0.359 CA052325 Prefoldin subunit 2 

0.573 0.284 CA051435 Ras-related protein Rab-14 

0.739 0.323 CA054426 ADP-ribosylation factor 1 

0.600 0.366 CB491150 Kinesin-like protein KIF20A 

0.760 0.235 CB514723 G-protein coupled receptor APJ homolog 

0.787 0.479 CB499801 Pre-B-cell leukemia transcription factor 2 

0.676 0.460 CA050914 CCAAT/enhancer-binding protein delta 

0.636 0.348 CB498272 Nucleolar protein 5A 

0.703 0.217 CA037622 60S ribosomal protein L36 

0.608 0.381 CA045500 60S ribosomal protein L9 

0.849 0.226 CA055741 60S ribosomal protein L9 

0.776 0.250 CA770261 60S ribosomal protein L23 
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Translation 


T538 


0.013 


0.048 


0.944 


0.433 


0.614 


0.285 


CA770402 


60S ribosomal protein LI 5 


Translation 


T538 


0.007 


0.042 


1.069 


0.256 


0.695 


0.336 


CB493600 


40S ribosomal protein S30 


Translation 


T711 


0.002 


0.033 


1377 


0.739 


0.805 


0.306 


CB493907 


40S ribosomal protein SI 9 


Translation 


T632 


0.006 


0.040 


1.137 


0.434 


0.697 


0.352 


CB503205 


60S ribosomal protein L4-B 


Translation 


T719 


0.009 


0.044 


0.908 


0.529 


0.528 


0.289 


CB5 14402 


60S ribosomal protein LI 9 


Translation 


T717 


0.014 


0.050 


1.555 


1.267 


0.905 


0.376 


CK990280 


60S acidic ribosomal protein PI 


Transmembrane 


T536 


0.006 


0.042 


1.285 


0.591 


0.836 


0.310 


CA041663 


pfam07690, IVIFS.I, Major Facilitator Superfamily 


Many functions 


T526 


0.002 


0.033 


1.206 


0.366 


0.791 


0.296 


CA047582 


Somatotropin precursor 


Many functions 


T502 


0.011 


0.044 


1.125 


0.407 


0.749 


0.338 


CA052412 


Ectodysplasin-A 


Many functions 


T630 


0.010 


0.044 


1.154 


0.613 


0.708 


0.350 


CA061568 


Nucleophosmin 


Many functions 


T595 


0.014 


0.050 


1326 


0.799 


0.782 


0.451 


CB503191 


Anterior gradient protein 2 homolog precurser 


Many functions 


T614 


0.011 


0.044 


1.147 


0.668 


0.711 


0.309 


CB5 15428 


Tartrate-resistant acid phosphatase type 5 precursor 


Other functions 


T762 


0.003 


0.033 


1.234 


0.769 


0.701 


0.287 


CA044554 


Prothymosin alpha 


Other functions 


T677 


0.004 


0.033 


1.283 


0.734 


0.765 


0.290 


CA055729 


Transmembrane protein 178 precursor 


Other functions 


T507 


0.010 


0.044 


1.002 


0.406 


0.665 


0.286 


CA058231 


Antolefinin 


Other functions 


T518 


0.006 


0.041 


1.141 


0.647 


0.752 


0.128 


CA059713 


Probable RNA-directed DNA polymerase from transposon BS 


Other functions 


T842 


0.011 


0.044 


1.594 


0.965 


0.865 


0.599 


CB490454 


Zona pellucida sperm-binding protein 3 precursor 


Other functions 


T630 


0.003 


0.033 


1.401 


0.569 


0.860 


0.369 


CB492596 


Proteasome subunit alpha type 4 


Other functions 


T614 


0.008 


0.044 


1.164 


0.582 


0.721 


0.333 


CB494074 


Oncorhynchus mykiss clone Glan 1 transposon e 


Other functions 


T574 


0.012 


0.046 


1333 


0.796 


0.847 


0.323 


CB497649 


Nucleoside diphosphate kinase B 


Other functions 


T528 


0.012 


0.046 


1.212 


0.637 


0.793 


0.301 


CB498610 


UNKNOWN 


Other functions 


T640 


0.003 


0.033 


1.289 


0.622 


0.786 


0.305 


CB499656 


Dual specificity mitogen-activated protein kinase kinase 6 


Other functions 


T775 


0.005 


0.040 


1.266 


0.692 


0.713 


0.389 


CB511161 


Voltage-dependent anion-selective channel protein 2 


Other functions 


T752 


0.007 


0.043 


1.073 


0.533 


0.613 


0.365 


CB511880 


Serine/threonine-protein kinase Haspin 


Other functions 


T513 


0.001 


0.033 


1.061 


0316 


0.701 


0.221 


CB514112 


Serine/threonine/tyrosine-interacting protein 


Other functions 


T536 


0.005 


0.038 


1.238 


0.600 


0.806 


0.250 


CB5 14435 


Williams-Beuren syndrome chromosome region 15 protein 
homolog 


Unknown 


1.657 


0.011 


0.044 


1.156 


0416 


0.698 


0.430 


CA039908 


UNKNOWN 


Unknown 


1538 


0.002 


0.033 


1.402 


0.577 


0.912 


0.258 


CA039963 


UNKNOWN 


Unknown 


1.765 


0.001 


0.033 


1.627 


0.789 


0.921 


0.367 


CA040487 


PREDICTED: similar to Keratin-associated protein 10-1 


Unknown 


1.674 


0.010 


0.044 


1.113 


0.748 


0.665 


0.284 


CA041505 


UNKNOWN 


Unknown 


1.607 


0.003 


0.033 


1.212 


0.436 


0.754 


0.338 


CA050842 


UNKNOWN 


Unknown 


1,715 


0.005 


0.040 


1.368 


0.802 


0.798 


0.361 


CA051475 


UNKNOWN 


Unknown 


1,723 


0.003 


0.033 


1307 


0.686 


0.758 


0.326 


CA054597 


UNKNOWN 


Unknown 


1,565 


0.003 


0.033 


1.001 


0430 


0.639 


0.218 


CA057262 


PREDICTED: similar to Rsbnl protein [Danio rerio] 


Unknown 


1.872 


0.000 


0.033 


1.204 


0.261 


0.643 


0.321 


CA058259 


UNKNOWN 


Unknown 


1.537 


0.011 


0.044 


1350 


0.696 


0.878 


0.346 


CA058522 


UNKNOWN 


Unknown 


1.698 


0.004 


0.033 


0.956 


0321 


0.563 


0.304 


CA061924 


UNKNOWN 


Unknown 


1.891 


0.012 


0.047 


1.581 


1414 


0.836 


0484 


CB490094 


UNKNOWN 
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Unknown 


1,558 


0.005 


0.040 


1.143 


0.493 


0.734 


0.294 


CB492336 


UNKNOWN 


Unknown 


1.634 


0.001 


0.033 


1.291 


0.479 


0.790 


0.280 


CB492905 


UNKNOWN 


Unknown 


1.660 


0.008 


0.044 


1.254 


0.653 


0755 


0.380 


CB494690 


UNKNOWN 


Unknown 


1.695 


0.001 


0.033 


1.297 


0.534 


0.765 


0.281 


CB497128 


UNKNOWN 


Unknown 


1.816 


0.009 


0.044 


1.431 


0.997 


0.788 


0.448 


CB500083 


UNKNOWN 


Unknown 


1.730 


0.010 


0.044 


1.218 


0459 


0.704 


0476 


CB509719 


PREDICTED: similar to CC chemokine SCYA103 [Danio rerio] 


Unknown 


1,693 


0.010 


0.044 


1.243 


0.634 


0.734 


0417 


CB5 10709 


UNKNOWN 


Unknown 


1.626 


0.002 


0.033 


1.365 


0.590 


0.840 


0.317 


CB511393 


UNKNOWN 


Unknown 


1.603 


0.008 


0.043 


1.193 


0.606 


0.744 


0.327 


CB511789 


PREDICTED: similar to Protein C14orf159 mitochondrial 
precursor 


Unknown 


1.619 


0.013 


0.047 


1.195 


0.818 


0738 


0.271 


CB5 13822 


UNKNOWN 


Unknown 


1.604 


0.011 


0.044 


1.273 


0.687 


0.794 


0.367 


CB5 14071 


UNKNOWN 


Unknown 


1.605 


0.009 


0.044 


1.069 


0.625 


0.666 


0.263 


CK990538 


UNKNOWN 


Unknown 


1.501 


0.008 


0.043 


1.158 


0.590 


0772 


0.227 


CK991114 


UNKNOWN 



Genes with significantly greater expression in Albert Johnson Creek (AJC) sockeye salmon muscle compared to Surprise Lake (SL) sockeye salmon muscle. We present the gene description, the raw P values, the 
multiple test corrected P values, the average normalized expression values and standard deviations for each population, the Genbank ID and gene description. 



Table 2 Genes significantly over-expressed in Surprise Lake sockeye salmon muscle compared to Albert Johnson Creek. 



Function 


Fold change (Up 


P value 


P Value 


Normalized 


StdDev 


Normalized 


StdDev 


Genbank 


Description 




in SL) 


raw 


MTC 


AJC 


AJC 


SL 


SL 






Aerobic infrastructure 


2.208 


0.002 


0.024 


0.667 


0465 


1.474 


1.039 


CB5 10651 


72 kDa type IV collagenase precursor 


Cell interactions 


2.755 


0.004 


0.033 


0.625 


0.571 


1.720 


2.017 


CA039936 


CD81 antigen 


Cell interactions 


T698 


0.003 


0.028 


0.648 


0.303 


1.100 


0.497 


CB488646 


Cysteine-rich protein 1 


Ce interactions 


T543 


0.002 


0.023 


0.777 


0.307 


1.199 


0.377 


CB498736 


Cysteine-rich protein 1 


Cell interactions 


2.358 


0.003 


0.028 


0.764 


0.687 


1.803 


1.383 


CB515159 


Collagen alpha-2(l) chain precursor 


DNA 


T610 


0.001 


0.020 


0.789 


0.314 


1.272 


0425 


CA051642 


Ribonucleoside-diphosphate reductase M2 subunit 


DNA 


2.907 


0.002 


0.024 


0.658 


0.399 


1.910 


2.344 


CB488712 


Structural maintenance of chromosomes protein IB 


DNA 


2.252 


0.000 


0.005 


0.706 


0.288 


1.589 


0.805 


CB490371 


Histone H3.3 


Energetic Metabolism 


1.825 


0.006 


0.036 


0.836 


0.275 


1.527 


1.051 


CA041073 


Calcium/calmodulin-dependent protein kinase type II delta 
chain 


Energetic Metabolism 


T508 


0.005 


0.036 


0.792 


0.363 


1.194 


0.376 


CA049006 


SUMO-activating enzyme subunit 2 


Energetic Metabolism 


T534 


0.001 


0.020 


0.703 


0.293 


1.078 


0.280 


CA061459 


cAMP-dependent protein kinase, beta-2-catalytic subunit 


Energetic Metabolism 


T553 


0.001 


0.020 


0.828 


0.308 


1.286 


0.396 


CA064428 


6-phosphogluconate dehydrogenase, decarboxylating 


Energetic Metabolism 


T684 


0.003 


0.028 


0.902 


0.325 


1.518 


0.759 


CB498472 


Selenide, water dikinase 2 


Iron Binding 


2.326 


0.010 


0.044 


0.730 


0.661 


1.697 


1.751 


CB505886 


Ferritin, heavy subunit 


Lipid catabolism 


1.580 


0.004 


0.033 


0.806 


0.271 


1.273 


0.578 


CA038195 


Phospholipase A2, acidic 1 precursor 


Mitochondria 


2.053 


0.003 


0.028 


0.861 


0.707 


1.766 


0.930 


CA056742 


Carnitine 0-palmitoyltransferase 2, mitochondrial precursor 


Mitochondria 


2.083 


0.008 


0.039 


0.680 


0.336 


1.416 


1.308 


CA058136 


5-aminolevulinate synthase, nonspecific, mitochondrial 




















precursor 


Mitochondria 


1.745 


0.010 


0.044 


0.683 


0.382 


1.192 


0.710 


CA060285 


Adenylate kinase isoenzyme 2, mitochondria 


Mitochondria 


1.623 


0.005 


0.036 


0.877 


0.354 


1.424 


0.682 


CA06201 7 


Single-stranded DNA-binding protein, mitochondrial precur; 


Muscle contraction 


2.294 


0.007 


0.039 


0.674 


0486 


1.548 


1.579 


CB509964 


Troponin 1, slow skeletal muscle 


regulation 




















Muscle contraction 


1.613 


0.010 


0.044 


0.807 


0.386 


1300 


0.650 


CB510901 


Troponin 1, slow skeletal muscle 


regulation 




















Organelle movement 


1.672 


0.001 


0.020 


0.735 


0.228 


1.230 


0.557 


CB498105 


Tubulin alpha-IC chain 


Protein breakdown 


1.504 


0.005 


0.035 


0.827 


0409 


1.245 


0.307 


CK990590 


Trypsin precursor 


Protein breakdown 


1.520 


0.006 


0.037 


0.815 


0.369 


1.238 


0.436 


CK991067 


Cathepsin H precursor 


Protein transport 


1.575 


0.012 


0.047 


0.836 


0.365 


1.316 


0.662 


CA042407 


Protein disulfide-isomerase A6 precursor 


Protien transport 


1.570 


0.001 


0.020 


0.683 


0.322 


1.072 


0.211 


CA044589 


Protein transport protein Sec61 subunit beta 


Sugar binding 


1.515 


0.008 


0.039 


0.799 


0.309 


1.211 


0.504 


CB505852 


serum lectin isoform 3 precursor [Salmo salar] 


Transcription (neg) 


1.961 


0.012 


0.047 


0.690 


0480 


1352 


1.070 


CA054647 


Selenoprotein K 


Transcription regulation 


1.592 


0.008 


0.039 


0.739 


0.336 


1.176 


0.540 


CB494071 


14-3-3-like protein GF14-F 


Translation 


1.590 


0.001 


0.020 


0.685 


0.340 


1.088 


0.223 


CA061402 


Pseudouridylate synthase 7 homolog 


Translation 


1.721 


0.000 


0.005 


0.753 


0.242 


1.294 


0.407 


CB516915 


Eukaryotic translation initiation factor 3 subunit 7 


Transport 


2.037 


0.003 


0.027 


0.687 


0.444 


1.400 


0.874 


CB496796 


Transmembrane emp24 domain-containing protein 3 




















precursor 


Transport 


1.880 


0.009 


0.044 


0.704 


0.449 


1.323 


0.920 


CBS 12385 


Vacuolar protein sorting-associated protein 41 homolog 


Many functions 


1.704 


0.000 


0.002 


0.725 


0.209 


1.235 


0.311 


CA048664 


Protein C-ets-1 



Table 2 Genes significantly over-expressed in Surprise Lake sockeye salmon muscle compared to Albert Johnson Creek. (Continued) 



Many functions 


1.845 


0.000 


0.010 


0.685 


0.241 


1.264 


0.520 


CA050496 


Transitional endoplasmic reticulum ATPase 


Many functions 


1.678 


0.006 


0.037 


0.882 


0.447 


1.479 


0.729 


CB5 15873 


Dihydropyrimidinase 


Other functions 


1.855 


0.006 


0.037 


0.858 


0439 


1.593 


1.058 


CA044775 


UNKNOWN 


Other functions 


2.105 


0.006 


0.037 


0.739 


0420 


1.556 


1.360 


CA051433 


Transmembrane and ubiquitin-like domain-containing protein 
2 


Other functions 


2.016 


0.005 


0.035 


0.812 


0480 


1.638 


1.205 


CA056436 


UNKNOWN 


Other functions 


1.645 


0.000 


0.018 


0.880 


0.292 


1.449 


0.514 


CA056626 


Lithognathus mormyrus clone Imos2p08h02 mRNA sequence 


Other functions 


1.502 


0.001 


0.020 


0.790 


0.284 


1.187 


0.321 


CA062511 


Translocon-associated protein subunit alpha precursor 


Other functions 


1.976 


0.004 


0.033 


0.579 


0.396 


1.144 


0.695 


CB498745 


KH domain-containing, RNA-binding, signal transduction- 
associated protein 1 


Other functions 


1.570 


0.012 


0.049 


0.878 


0.241 


1.379 


0.785 


CB5 16580 


Kelch-like protein 6 


Other functions 


1.692 


0.004 


0.033 


0.851 


0.335 


1.440 


0.741 


CB516919 


Extracellular matrix protein 1 precursor 


Unknown 


1.536 


0.010 


0.044 


0.838 


0420 


1.289 


0.512 


CA040470 


UNKNOWN 


Unknown 


1.789 


0.001 


0.020 


0.723 


0.286 


1.294 


0.650 


CA041684 


UNKNOWN 


Unknown 


1.623 


0.007 


0.039 


0.881 


0.511 


1.429 


0.555 


CA054094 


UNKNOWN 


Unknown 


1.656 


0.001 


0.020 


0.747 


0.326 


1.237 


0420 


CA058336 


PREDICTED: similar to Transforming growth factor, beta- 
induced [Danio rerio] > 


Unknown 


1.563 


0.001 


0.020 


0.629 


0.224 


0.982 


0.332 


CA060640 


UNKNOWN 


Unknown 


1.634 


0.008 


0.039 


0.879 


0454 


1.437 


0.670 


CA767983 


PREDICTED: similar to small inducible cytokine SCYA105 [Danio 
rerio] 


Unknown 


1.534 


0.001 


0.020 


0.863 


0.332 


1.324 


0.362 


CB492594 


UNKNOWN 


Unknown 


1.541 


0.008 


0.039 


0.782 


0.296 


1.204 


0.544 


CB493987 


UNKNOWN 


Unknown 


1.808 


0.004 


0.033 


0.709 


0484 


1.282 


0.571 


CB502569 


UNKNOWN 


Unknown 


1.575 


0.009 


0.044 


0.855 


0.372 


1.346 


0.640 


CB505933 


UNKNOWN 


Unknown 


1.585 


0.002 


0.024 


0.793 


0.395 


1.257 


0.322 


CB506647 


UNKNOWN 


Unknown 


1.517 


0.010 


0.044 


1.005 


0.575 


1.527 


0.437 


CB5 15453 


UNKNOWN 


Unknown 


1.543 


0.010 


0.044 


0.758 


0.338 


1.168 


0.517 


CB5 16051 


UNKNOWN 


Unknown 


1.727 


0.001 


0.020 


0.837 


0479 


1.445 


0424 


CB5 16202 


Sterile alpha motif domain-containing protein 9-like 


Unknown 


1.570 


0.007 


0.039 


0.828 


0.346 


1.299 


0.578 


CK991103 


PREDICTED: similar to transposase [Strongylocentrotus 
purpuratus] 


Unknown 


1.751 


0.000 


0.010 


0.712 


0.280 


1.246 


0.433 


CK991281 


UNKNOWN 



Genes with significantly greater expression in Surprise Lake (SL) sockeye salmon muscle compared to Albert Johnson Creek (AJC) sockeye salmon muscle. We present the gene description, the raw P values, the 
multiple test corrected P values, the average normalized expression values and standard deviations for each population, the Genbank ID and gene description. 
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muscle. Of these, 498 genes were expressed at higher (or 
over-expressed) levels in AJC compared with SL. Of 
these over-expressed transcripts, 230 and 240 were 
annotated with biological process and molecular func- 
tion Gene Ontology (GO) terms, respectively. In all 
cases we used the GO Slim dataset. In biological pro- 
cess, biosynthesis (GO:9058; P = 0.009) and behavior 
(GO:7610; P = 0.019) were the only GO categories sig- 
nificantly enriched (Table 3). In the molecular function 
ontology, the only enriched category was structural 
molecular activity (GO:5198; 43 genes, P < 0.001). 

We found 528 genes significantly expressed at higher 
levels in SL muscle compared to AJC (P < 0.05, no MTC). 
Of this list, 267 and 313 features were annotated with bio- 
logical process and molecular function GO terms, respec- 
tively. In this analysis, metabolism (GO:8152) was the only 
biological process category significantly enriched {P = 
0.019), containing 192 genes. There are six significant 
categories enriched from the molecular function category 
(Table 3). 

Reverse-Transcription Quantitative Polymerase Chain 
Reaction (RT-qPCR) 

As 40s ribosomal and 5-amino-levuleninic acid synthase 
(Genbank:CB493907 and CA058136) were identified as 
the most stable normalizer candidates through the 



geNORM algorithm, these transcripts were used to gen- 
erate relative expression ratios of genes of interest 
(GOI). The four GOI's are 72 kDa type IV collagenase 
precursor (Genbank:CB510651), troponin I, slow skeletal 
muscle (Genbank:CB510901), single-stranded DNA- 
binding protein, mitochondrial precursor (Genbank: 
CA062007), and malate dehydrogenase (Genbank: 
CA044864). Two of the four investigated genes were dif- 
ferentially expressed, 72 kDa type IV collagenase precur- 
sor and troponin I, slow skeletal muscle, both 
significantly over-expressed in SL juveniles, presented in 
Figure 2. 72 kDa type IV collagenase precursor was 
highly significant (FC > 2; p = 0.00013). Troponin I, 
slow skeletal muscle displayed a high level of biological 
variation among biological replicates, as can be viewed 
by the large 95% confidence intervals for this GOI (Fig- 
ure 2). 

Discussion 

We have characterized molecular phenotypes in muscle 
tissue that relate to morphology, life history, and ecology 
in sockeye salmon. We also discovered differentially 
expressed genes and enriched functional categories asso- 
ciated with differing morphology and life history types 
of sockeye salmon in two habitats. This work represents 
the first characterization of a molecular phenotype in 



Table 3 Gene Ontology (GO) enrichment results. 


Category 


P 


Genes in GO category 


% of differentially expressed 


Genes in GO 


% genes on array in 




value 


over-expressed 


genes in GO category 


category on array 


GO category 


Over-expressed in AJC: 












Biological process 












GO:9058: biosynthesis 


0.009 


52 


2441 


1264 


17.82 


GO:7610: behavior 


0.019 


10 


4.695 


156 


2.2 


Over-expressed in AJC: 












Molecular function 












GO:5198: structural molecule 


< 


43 


17.92 


750 


9.043 


activity 


0.001 










Over-expressed in SL: 












Biological process 












GO:8152: metabolism 


0.019 


192 


71.91 


4674 


65.91 


Over-expressed in SL: 












Molecular function 












GO:16209: antioxidant activity 


0.013 


6 


1.917 


52 


0.627 


GO:8135: translation factor 


0.010 


13 


4153 


166 


2.001 


activity, nucleic acid binding 












GO:45182: translation regulator 


0.014 


13 


4153 


173 


2.086 


activity 












GO:5489: electron transporter 


0.005 


17 


5.431 


225 


2.713 


activity 












GO:8233: peptidase activity 


0.043 


28 


8.946 


529 


6.378 


GO:3824: catalytic activity 


0.001 


166 


53.04 


3645 


43.95 



These are the significant GO Slim categories representing both biological process and molecular function ontologies for population specific significantly over- 
expressed {P < 0.05; no multiple test correction} features. For each significant GO category, we include the P value number of over-expressed genes in that GO, 
percentage of representation in the over-expressed list, number of features of that GO in the microarray, and percentage of representation on the entire 
microarray. 
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1.75 



I 1.25 



□ Lake-type 

□ Riverine 



72kDa TropS SinMit Malate 

Gene of Interest 

Figure 2 RT-qPCR transcript profiles displaying mean 
normalized quantities in the lake-type and riverine ecotypes 

"72 kDa" is 72 l<Da type IV collagenase precursor {Genbanl<:CB510651), 
'TropS" is troponin I, slow skeletal muscle (Genbank:CB510901), and 
"SinlVlit" is single-stranded DNA-binding protein, mitocliondrial 
precursor (Genbank:CA062007). Expression is relative to the 
geometric mean of expression levels of normalizers 40S ribosomal 
and 5-aminolevulinate synthase (Genbank:CB493907 and CA058136). 
Malate dehydrogenase (Genbank:CA044864) is included as an 
example a feature that was not significantly differentially regulated 
with the microarray analysis. Significance was determined by a one- 
tailed Mann-Whitney (J test, 'denotes p < 0.05 "denotes p < 0.001. 



muscle or any other tissue of juvenile sockeye between 
these common habitat types. Because these populations 
are in relatively pristine habitats, these ecologically 
based gene expression differences provide a reference 
for published and future studies of sockeye salmon in 
habitats more impacted by human activities [17,36,37] 

Riverine sockeye have a deep, robust body compared 
with the lake-type life history [27] . We find this pattern 
between the AJC and SL populations (Figure 1). In paral- 
lel, some patterns in expression profiles in the present 
study reflect these phenotypes. For example, in AJC, ten 
ribosomal proteins were over-expressed compared with SL 
and one of these (Genbank:CA045500) was among the 
highest over-expressed in AJC (Table 1). In comparison, 
we did not identify any ribosomal proteins over-expressed 
in SL compared with AJC (Table 2). All of these features 
on the array map to different contigs with the cGRASP 
Expressed Sequence Tag (EST) clustering online tool and 
therefore are likely to represent different genes. Many 
ribosomal proteins stabilize the structure composed 
mostly of ribosomal RNA [38]. Thus, the differential 
expression of these ribosomal proteins may indicate more 
protein synthesis in the muscle tissue of AJC sockeye. In 
addition, five genes associated with cell division, DNA 
replication and a growth hormone gene were over- 
expressed in AJC (Table 1 and Additional File 1, Table SI). 



These patterns are consistent with faster growth and more 
muscle mass associated with the deeper body morphology 
in AJC sockeye [27]. The GO category of biosynthesis 
(GO:9058) is defined as "The chemical reactions and path- 
ways resulting in the formation of substances; typically the 
energy- requiring part of metabolism in which simpler sub- 
stances are transformed into more complex ones" (http:// 
amigo.geneontology.org). This was an enriched category in 
AJC of the GO Slim biological process ontology which 
further underscores that the expression profile in AJC cor- 
responds to increased biomolecule production. 

Creatine kinase (Genbank:CB503498) was over- 
expressed in AJC compared with SL. This gene is poten- 
tially important in both aerobic respiration in the pathway 
of oxidative phosphorylation, as well as anaerobic metabo- 
lism in glycolosis [39]. However in both processes this 
gene regulates the amount of available cellular ATP so it 
facilitates fluctuating energy demands [39] . This may be 
important to the riverine "wait and burst" feeding style of 
AJC, which may involve more variable levels of feeding 
activity. 

In the SL gene enrichment analysis, the sole significant 
generic GO Slim ontology is metabolism (GO:8152). Also, 
one of the significant GO terms in the molecular function 
ontology is electron transport indicating aerobic respira- 
tion. Many of the individual genes in the over- expressed 
list relate to energy metabolism, mitochondria, and muscle 
contraction regulation. This is compatible with increased 
metabolism, especially for continuous swimming. Several 
of these genes may be particularly important for the con- 
tinuous swimming strategy of lake-type sockeye. Troponin 
I, slow skeletal muscle (present twice in the over-expressed 
list; Genbank:CB509964 and CB510901) is a gene that reg- 
ulates muscle contraction and the "slow" label of the anno- 
tation indicates that this transcript is specific to slow 
twitch or aerobic muscle fiber [40] . We confirmed with the 
cGRASP EST clustering database that these features map 
to different contigs and therefore likely represent two dif- 
ferent genes. The latter of these two genes was also found 
to be significantly over-expressed in our RT-qPCR analysis 
(Figure 2). These findings could be the result of either 
increased red muscle fibers present, increased recruitment 
of red muscle fiber, or both. Additionally, 72 kDa type IV 
collagenase precursor (Genbank:CB510651), was 
over-expressed in SL juveniles and is implicated in blood 
vascular remodelling [41]. This gene was also found to 
be over-expressed in our qPCR analysis. These may lay 
the infrastructure for increased aerobic needs. Another 
SL over-expressed gene, selenoprotein K (Genbank: 
CA054647), is a response to oxidative stress [42], which 
may occur in increased aerobic activity. We did not sepa- 
rate red and white muscle tissue in our experiment. Many 
fish species have the muscle fiber types distinctly separated 
and ecotypes may differ in their composition of red and 
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white muscle sections [43]. Pacific salmon, however, have 
red muscle fibers mixed in with the main white muscle 
mass [44]. We collected all of the main locomotion muscle 
tissue from each individual as we also wanted to capture 
gene expression differences due to different muscle fiber 
composition. 

Other significant GO terms have less of a clear func- 
tional relationship with the ecology of these populations. 
The translation factor activity (GO:8135 and GO:45182) 
terms represented in SL is composed of translation initia- 
tion factor genes. This is in contrast to over-expression of 
ribosomal proteins in AJC including 10 in the significantly 
over-expressed genes (Table 1) and 37 of the 52 genes in 
the biosynthesis (GO:9058) term (Table 3). It is also 
unclear why behavior (GO:7610) is a significant GO term 
enriched in AJC (Table 3). There are likely important 
behavioral differences between these populations, but the 
ten genes contained within this GO term, appear to be 
genes that have many divergent functions and the beha- 
vioural annotations are mostly related to mice. 

RT-qPCR results were concordant to the microarray 
results in three out of four cases. Additionally, in all 
four cases, the average expression level was in the same 
direction for the RT-qPCR and microarray assays (Addi- 
tional File 2, Table S2). Our sample size was smaller 
with the RT-qPCR (SL: n = 9 and AJC n = 10) and this 
may have resulted in a reduction of power compared 
with the microarray assay and the lack of significance 
agreement in one of the four comparisons. 

An unanticipated discovery was the increased expression 
of immune function genes in AJC, including two features 
annotated with MHC II function (Genbank:CB492871 and 
CA048654; Table 1). This may indicate differing immunity 
challenges in the river and lake rearing habitats of this 
study. This finding is a good example of indirect hypoth- 
esis generation that can come from using such large data- 
set producing tools. As microarrays facilitate the screening 
of a large number of genes they may uncover unexpected 
traits that are difficult to measure, even if not identified as 
potential traits of interest during experimental design [20]. 

We detected differential regulation of select regulatory 
genes over-expressed in AJC including two transcription 
factors. Pro-B-cell leukemia transcription factor 2 (Gen- 
bank:CB499801) and CCAAT/enhancer-binding protein 
delta (Genbank:CA050914) regulate transcription [45]. 
The latter has the second highest over-expression fold 
change in AJC. In SL, one gene annotated as "unknown" 
in the cGRASP annotation file, was identified through 
the re-BLAST methods as "far upstream binding ele- 
ment protein 3" (Entrez Gene ID 100194998). These 
regulatory genes could have cascading effects in gene 
expression [46], and their roles in these ecotypes should 
be investigated further. 



Whether due to recent sequence submissions [47], or 
through the challenges of assembling large EST datasets 
in individuals with recent genome duplications [48,49], a 
few annotations of differentially expressed genes varied 
from the originally released 16K annotation file [50]. 
The genes with new annotation can be viewed in Addi- 
tional File 1, Table SI. One example that our individual 
BLAST efforts identified was 60 S ribosomal protein 
L14 (BT060370.1). Also, we identified another gene as 
the antifreeze protein, type 2 ice-structuring protein 
(Entrez Gene ID: 100195780). This gene has obvious 
ecological implications for the colonization of new lakes 
and may have been especially important in post-glacial 
lakes. These two different BLAST hits were very recently 
annotated 25-August 2010 [47]. These few difference 
between the cGRASP annotation file and current blast 
hits underscore both the computational complexity of 
assembling genomes and the constantly changing knowl- 
edge of gene function. 

Both lists of differentially expressed transcripts contain 
many unknown function annotations, and although we 
cannot assign any molecular function to these genes based 
on this study, we do now have ecological context for these 
genes. Furthermore, as more genes are annotated, we may 
gain more insight on the role of these unknowns in the 
ecotype variances, as was the case with "far upstream 
binding element protein 3" as described above. 

Our results yield both similarities and differences when 
compared to the gene expression work on lake whitefish 
[7,21]. The morphological and expected ecological differ- 
ences in juvenile sockeye salmon are not nearly as extreme 
as those observed in lake whitefish, which are drastically 
different in growth rate and age of maturity. However, like 
the present study, the fold change differences between 
ecotypes in the lake whitefish work with both microarrays 
and qPCR are modest, suggesting this may be the norm 
for ecological transcriptomic differences in natural popula- 
tions. Unlike lake whitefish, sockeye salmon are anadro- 
mous, and our study populations move to the ocean after 
freshwater rearing, where feeding environments and access 
to them may be similar [51]. Therefore, differences at the 
juvenile rearing stage may be limited, because this is only 
one part of a complex life history, and the life history 
types may developmentally converge for the ocean feeding 
stage. 

In lake whitefish, parvalbumin beta (Genbank: 
AF538283) was the only gene involved with muscle con- 
traction regulation that was consistently over-expressed 
in the dwarf ecotype. We did not find evidence of over- 
expression of this gene in SL, but another gene involved 
with muscle contraction, the slow twitch isomer of tro- 
ponin, was significantly over-expressed in SL. It is 
expected that feeding strategy promotes continuous 
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swimming in dwarf lake whitefish [34]. In addition, 
dwarf lake whitefish are under high predation compared 
with normal whitefish, an ecological attribute responsi- 
ble for increased burst swimming. This should favor 
both aerobic and anaerobic metabolism in the same eco- 
type resulting in selection favoring overall increased 
metabolism and muscle contraction [34]. In the present 
study, high predation and a burst swimming feeding 
strategy are expected only in riverine AJC, whereas a 
continuous swimming strategy and low predation should 
occur in lake-type SL. These differing scenarios of selec- 
tion may result in less-pronounced partitioning of swim- 
ming energetics in sockeye salmon compared with lake 
whitefish. 

In lake whitefish, many of the differentially expressed 
genes in nature retained differential expression when indi- 
viduals were raised in a common environment [21]. Also, 
gene mis-expression in lake whitefish dwarf x normal 
backcross is associated with reduced egg survival [52]. It is 
difficult to distinguish cause from effect in these situations, 
as the mis-expression in underdeveloped eggs may be the 
result of the underdeveloped phenotype and the cause 
may be in an unmeasured earlier stage of development 
[53] . In summary, in lake whitefish, gene expression traits 
have a genetic component and can affect traits important 
to reproduction. 

Other fish species also manifest the benthic/limnetic 
ecotypes including threespine stickleback [54], Dolly Var- 
den {Salvelinus malma) [55] and Arctic charr (5. alpinus) 
[56]. Though despite many behavioral, morphological, and 
genetic studies, relatively few of these important ecological 
model species have been investigated at the transcriptomic 
level. Elmer et al. [57] found non-synonymous divergence 
in ESTs related to biosynthesis, metabolism and develop- 
ment in South American crater lake cichlids {Amphilophus 
astorquii and A. zaliosus). Other studies of fish transcrip- 
tomics have focused on spawning survivorship [36] and 
salt/freshwater transitions [37,58] 

Our study has limitations in that we only present a 
single tissue type in a single point in time for these 
populations. Also, the morphological sampling and the 
gene expression sampling took place in different years, 
though we expect that the morphological differences are 
temporally stable, at least in the time scale between the 
two sampling periods. The morphological and gene 
expression differences between these populations may 
be due to phenotypic plasticity, adaptive or non-adaptive 
genetic processes, or a combination of all three [12,59]. 
Like many phenotypic traits, gene expression is an inte- 
gration of both environmental and genetic components 
[20,60]. Phenotypic plasticity itself may have a genetic 
component and may be adaptive, especially in species 
with range expansion and contraction, where coloniza- 
tion of new habitats occurs often [61,62]. Even gene 



expression differences that are purely plastic are impor- 
tant to further our understanding of ecology and coloni- 
zation, and may facilitate adaptation in other non-plastic 
traits [63]. 

Conclusions 

We have developed the first dataset characterizing gene 
expression differences between two populations of sock- 
eye salmon representing lake-type and riverine life his- 
tories. Although this represents a first step in 
considering the ecological transcriptomic differences of 
juvenile sockeye, we have already identified clear pat- 
terns relating to the divergent ecological phenotypes of 
these populations. In riverine sockeye muscle tissue, 
genes of higher expression were primarily associated 
with growth whereas in the lake-type sockeye, metabo- 
lism was the theme. Since these populations reside in a 
pristine part of the sockeye range, this study may serve 
as a reference location for future studies of populations 
that are more impacted by human activities. 

Methods 

Study site 

Aniakchak National Monument and Preserve (ANMP) 
in southwest Alaska provides a unique system to study 
these sockeye life history strategies (Figure 3). The 
ANMP has undergone several recent geologic events. A 
massive volcanic eruption 3,650 years before present (b. 
p.) formed a large caldera (Aniakchak Caldera) that 
filled with water creating a lake [64,65]. Approximately 
1,800 b.p. [66] the caldera wall collapsed resulting in a 
large flood and the formation of the Aniakchak River, 
which connects the remainder of the caldera lake (Sur- 
prise Lake; elevation 321 m) with the Pacific Ocean 
through "The Gates", a chasm opened through the cal- 
dera wall by the flood [67]. A large fluvial plain was 
established when the passing flood dropped sediment as 
it exited the caldera. Several smaller eruptions have 
occurred, including well-documented events approxi- 
mately 500 and 80 b.p. [64]. Sometime after the 500 b.p. 
eruption lake-type sockeye salmon colonized Surprise 
Lake (SL) and used the lake for juvenile rearing [68]. A 
riverine sockeye population also rears in Albert Johnson 
Creek (AJC), the largest tributary of Aniakchak River 
[35]. Albert Johnson Creek is a low gradient stream that 
meets Aniakchak River at the base of the volcano in the 
large fluvial plain that was the result of the caldera 
draining flood, 1,800 b.p.. Thus, current populations 
representing each of lake-type (SL) and riverine (AJC) 
life history types coexist in the same drainage. 

Morphology methods 

In order to make a morphological comparison between 
the two populations in this study, we reanalyzed a 
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Figure 3 The two sampling locations of this study showing Albert Johnson Creek (AJC) and Surprise Lake (SL) in Aniakchak Caldera 



subset of the morphological dataset from Pavey et al. 
2010 of 360 age 0 (meaning previous to first winter) 
juvenile sockeye with only the two current study popula- 
tions [Table 1 in [27]]. Twelve landmarks were digitized 
on each image using TpsDig (Figure 1, middle panel). 
All methods were identical except here we only com- 
pared the SL and AJC populations. All uses of animals 
in this study were approved by the animal care and use 
committee of either Simon Fraser University or Univer- 
sity of Alaska Anchorage. 

Gene expression methods 

Juvenile sockeye salmon were sampled on August 8'^ 
2007. The time of sampling for Albert Johnson Creek 
was 1535 h to 1703 h and Surprise Lake was 1832 h to 
2057 h. The entire sampling effort took place within 5.5 
hours including transportation from Albert Johnson 
Creek to Surprise Lake by a Cessna 185 airplane. A 
beach seine was used to capture fish and a strict sam- 
pling protocol including sampling time was enforced to 
reduce fish-to-fish sampling bias. Fish of similar lengths 
were sampled from each site. Mean fork length was 45.9 
mm (n = 17; SD = 3.5 mm) for AJC and 45.0 mm (n = 



13; SD = 5.6 mm) for SL. One fish from each seine haul 
was placed in a lethal solution of MS-222 (100 mg/1). 
An incision was made in the body cavity with a scalpel 
and the entire fish was placed in RNA/«ter™ (Ambion). 
The maximum time between netting a fish to RNA pre- 
servation was five minutes. The samples were kept cool 
in the field and transported, then frozen, and stored at 
-20''C to -80''C until RNA extraction. 
RNA preparation 

The samples were thawed and blotted with a Kimwipe®. 
All of the primary locomotion muscle tissue including 
red and white muscle tissue was removed from each 
fish. Total RNA was extracted with a modified protocol 
of the Invitrogen TRIzol® Plus RNA purification kit 
using PureLink™ Micro-to-Midi^"^ columns. Disruption 
and homogenization were achieved with a MixerMill 
MM301 (Retsch). The manufacturer's protocol was fol- 
lowed for each extraction with the exception of using 
150 1^1 of chloroform and 150 |il of low pH phenol to 
ensure dissociation of proteins and isolation of RNA. 
The quality of all RNA samples was verified on a 1% 
agarose gel. All samples were quantified with a Spectro- 
photometer ND-1000 (NanoDrop). 
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cDNA and aRNA synthesis and labeling 

cDNA was synthesized with Invitrogen Superscript^"^ III 
Indirect cDNA labeling system kits per manufacturer's 
instructions. In brief, 10 \ig of total RNA from a single 
individual was combined with a master mix including 
reverse-transcriptase and oligo (dT)2o primers. This 
reaction was incubated for three hours at 46°C to 
synthesize single-stranded cDNA. The samples were 
cleaned with the S.N.A.P ™ column purification proce- 
dure (Invitrogen). 

A reference pool was prepared with representative 
total RNA samples of juvenile sockeye muscle and liver, 
and adult sockeye brain, muscle and liver. Multiple tis- 
sues were used to ensure hybridization of the reference 
channel to all spots that may have been hybridized by 
the sample channel cDNA, and therefore able to be 
quantified as a ratio. The total RNA was amplified using 
an Amino AUyl MessageAmp™ II aRNA amplification 
Kit (Ambion AM1753) as per manufacturers instruc- 
tions. Briefly, RNA from several individuals from both 
populations of a single tissue type was combined. Then, 
single-stranded cDNA was synthesized from the RNA 
whereupon the second strand was synthesized with 
DNA polymerase. This product was purified through 
columns, and then amino allyl-modified aRNA was tran- 
scribed from the cDNA. The aRNA from divergent tis- 
sue types was combined in equal amounts at this point 
and this common reference pool was labeled with Cy3 
to be compared with a single individual in the experi- 
ment labeled with Cy5. 

Individual samples and reference material was coupled 
with mono-reactive CyDye™ packs (GE Healthcare). In 
short, the common reference pool of aRNA was coupled 
with Cy3 and the individual sockeye muscle tissue 
cDNA with Cy5 dyes for one hour at 4°C. The samples 
were then purified to remove all uncoupled dye using S. 
N.A.P.™ columns as per manufacturer's instructions 
(Invitrogen). The dye coupled sample and reference 
were stored at 4°C in the dark until hybridization. 
Microarray hybridization 

We used the cGRASP 16 k cDNA microarray to com- 
pare the transcriptomes of these populations with diver- 
gent life histories. This microarray consists of 16,006 
elements chosen from 300,000 Atlantic salmon and rain- 
bow trout cDNA libraries [50]. The libraries were 
derived from a variety of tissue types at different devel- 
opment stages, and conditions. Element sequences were 
chosen for minimum overlap, sequence quality, and 
other criteria [50]. 

We followed an established hybridization protocol for 
the 16 k cDNA array [50]. In brief, both 250 ng of refer- 
ence aRNA and 500 ng of sample cDNA were collected 
in a single tube and kept dark. The mixture was concen- 
trated with a speed vacuum and brought up to 23 i^l 



with RNase free water (Gibco). Hybridization buffer #3 
(Ambion) was heated to 65°C while occasionally mixing 
for one hour. The heated buffer and LNAdt blocker 
(Genesphere) was then added to the collected sample, as 
per manufacturer's instructions. We used the Tecan HS 
4800 Pro, an automated hybridization machine to hybri- 
dize sample cDNA to the arrays (Tecan). Before the 
sample injection, the programmed Tecan washed with 
several solutions containing first 1 x SSC, then 0.1 x 
SSC 0.014% SDS, then 5 x SSC, 0.01% SDS, and 0.2% 
BSA. Samples were heated to 80°C for 5 minutes, and 
then kept at 65°C until injected onto the pre-washed 
arrays in a Tecan HS 4800 Pro, as per manufacturers' 
instructions. Microarrays were hybridized for 16 hours, 
and the full protocol for the hybridization can be viewed 
in Additional File 3, Table S3. 

Post-hybridization, arrays were rinsed in the Tecan 
modules with increasingly stringent SSC and SDS solu- 
tions, starting with 2 x SSC, 0.014%SDS for four washes 
incrementally decreasing temperature, then one final 
wash of 0.2 X SSC at 23 °C. Finally, slides were dried 
with 37 psi nitrogen gas and kept dark until scanned. 
Current protocols for cGRASP microarrays are available 
at: http://web.uvic.ca/grasp/microarray/protocols/tecan_- 
hybridization_protocol.pdf 
Scanning and quantifying 

All microarrays were scanned immediately after hybridi- 
zation was complete using a ScanArray Express (Perkin- 
Elmer). The microarray images were quantified manually 
with ImaGene 5.6.1 (BioDiscovery). Spots with unusual 
morphology, offset, or other poor quality parameters 
were flagged as marginal and excluded these from 
downstream analyses. 
Array normalization and statistical analysis 
We performed all analyses in GeneSpring GX 7.3 (Agi- 
lent). The arrays were normalized as per typical two- 
color experiments by performing an array-wide inten- 
sity-dependent Lowess normalization, followed by a per 
gene normalization, which normalized each spot to the 
median value. The average base/proportionate value was 
calculated to be an intensity of 72, so we filtered data to 
retain only the 14,652 entities with average raw signal 
expression values greater than 72 in at least one of the 
populations. This became our base expression data for 
analysis. Our GeneSpring analysis was performed in two 
ways. First, the dataset was filtered to retain only the 
genes where the average differential expression was > 
1.5 fold. This list was used in a t-test without equal var- 
iances assumption {P < 0.05; no equal variance assump- 
tion) with a Benjamini and Hochberg False Discovery 
Rate multiple test correction (MTC; [69]. The spot ID's 
from the cGRASP 16K annotation file (current annota- 
tion files available at: http://web.uvic.ca/grasp/microar- 
ray/ array.html; [48]) were used to associate ESTs on the 
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array with gene descriptions. To confirm current anno- 
tation of the differential gene Ust, a Megablast was per- 
formed on associated EST sequences, or used tBLASTx 
on the associated EST, or contiguous sequence (contig). 
The default parameters were used for these database 
queries. All normalized expression values as well as raw 
data was deposited in the NCBI Gene Expression Omni- 
bus database (GEO Accession: GSE31214). 
Gene Ontology analysis 

To account for all genes potentially differentially 
expressed, not just those with high fold changes, or that 
passed highly stringent statistical methods such as those 
that passed the multiple test correction, a less stringent fil- 
tering on the base gene list was generated for the Gene 
Ontology (GO) analysis. Genes significantly differentially 
expressed by any amount that passed a t-test (P < 0.05) 
and without a multiple test correction were included. We 
then performed GO enrichment analysis on this list of 
over-expressed genes using the GO browser in Gene- 
Spring. GO categories that were significantly represented 
at a higher proportion in the over-expressed list than the 
array at large {P < 0.05) for GO Generic Slim ontology of 
both biological process and molecular function were 
produced. 

Reverse-transcription Quantitative Polymerase Chain 
Reaction (RT-qPCR) 

Total RNA samples used in microarray analysis were used 
for RT-qPCR. Single-stranded cDNA was synthesized 
from 4 \ig total RNA using Superscript® III First-Strand 
Synthesis System for RT-PCR (Invitrogen), as per manu- 
facturer's instructions. Briefly, total RNA was incubated 
for 50 min at 50°C with 5 \iM oligo(dT2o) primers. Each 
sample was then diluted 10 fold to prepare for qPCR. Four 
genes of interest (GOI) were selected for potential ecologi- 
cal relevance. Amplicons were designed within 500 base 
pairs of the 3' end of the coding sequence for each GOI in 
conserved regions between Atlantic salmon and rainbow 
trout (O. mykiss), and checked for specificity of sequence 
by BLAST. 



Primer efficiency was tested by a standard curve of 
experimental sample cDNA synthesized as described 
above. The standard curve was generated from an initial 
10-fold diluted sample which was then used as the start- 
ing point for a two-fold, 6 point serial dilution series. 
qPCR amplification was performed with SYBR 
GreenER™ qPCR SuperMix Universal master mix, as 
per manufacturer's instructions (Invitrogen), in 20 |jL 
reaction volumes containing 400 |iM primers on an 
Mx3000PT^ thermal cycler (Agilent) with the following 
thermal regime: 95°C for 7 min (1 cycle); 95°C for 30 s, 
60°C for 1 min, 72°C for 30 s (40 cycles); followed by a 
melt curve of 95°C for 30 s reading at every 0.5°C incre- 
ment. Singularity and correct product identification was 
determined by agarose gel electrophoresis, melt curve 
analysis, and amplicon sequencing. Primer sequences, 
correlation with dilution series (R ), and efficiency 
values are presented in Table 4. 

For each GOI, biological replicates were run in quadru- 
plicate on one plate with 9 and 10 biological replicates for 
lake-type and riverine ecotype conditions, respectively. 
Clear outlier technical replicates (> 0.2 Ct values from the 
other replicates) were removed from analysis. If one biolo- 
gical replicate had two technical replicates indicating one 
Ct value and the other two indicated a different Ct value, 
none were removed, as the correct pair could not be dis- 
cerned. The replicate variability was within 0.5 Ct for 110 
of 114 sample-target combinations. All NTC did not indi- 
cate the melt temperature of the GOI amplicon, and 5 of 
6 investigated genes had more than 7 Ct between the aver- 
age NTC Ct and the most dilute unknown sample (tropo- 
nin I, slow skeletal muscle was only 3.8 Ct from the 
average NTC primer dimer; SABiosciences). Additionally, 
all GOIs were contained within the standard curve dilu- 
tion series, with the exception of one malate dehydogenase 
sample which was more dilute than the most dilute point 
of the dilution series, and troponin I, slow skeletal muscle, 
which contained 3 samples more concentrated than the 
dilution series (all within 1.5 Ct of the most concentrated), 



Table 4 Gene descriptions, primer sequences, efficiency, and R2 for each gene quantified with RT-qPCR. 


Description 


Primers 5'-3' (forward, reverse) 


PCR efficiency, % 




72 kDa type IV collagenase 


5' TTC GCT GGA GAC AAG TTC TG 
5' TTT GAC GAT Cn GAG GCT ACT G 


80.1 


0.99 


Troponin 1 Slow 


5' GAG GAC TTA GGA GGG AAG TIT AAG 
5' AGA CAT GGC CTC GAG ATT CTT AC 


84.0 


0.99 


Single Mit Free 


5' AGA TGT GAG CCA GAA GAC GAC 
5' TCG AAT GTT GTC GCT TAA GAA TAG 


105.3 


0.99 


40S Ribosome 


5' GGA GAA GTG CTT CTA GAG GAG AG 
5' GCT TCT TGT CCA CCA TGT TGA G 


95.1 


0.99 


Malate dehydogenase 


5' ATT TCT AGA GTG GAG AGA GG 

5' GAA CAG GGA ATG AGT AGA TGA GG 


106.5 


0.99 


5-amino-levuleninic acid synthase 


5' ACA TGA TCG CTG TGA GAG TGT C 
5' TGA TTC GGA CTT GAG AGG TAG AC 


95.6 


0.97 
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and 8 samples less concentrated than the dilution series 
(all within 2 Ct of the least concentrated). 

Data analysis was performed using qbasePLUS (Bioga- 
zelle). All quantified genes were tested as normalizer 
candidates using geNORM. We did not include Single 
mitochondial precurser in this test as it appeared to be 
co-regulated with 5-aminolevulinate synthase in these 
samples (both transcripts are mitochondrial precursors 
and showed similar non-normalized expression patterns 
(data not shown)). The most stably-expressed transcripts 
were 40S ribosomal and 5-amino-levuleninic acid 
synthase, collectively displaying an M value of 0.645 and 
coefficient of variation of 0.225, within limits typically 
observed for stably expressed reference genes in hetero- 
geneous samples (M value < 1 and CV < 0.5) [70]. Addi- 
tionally, malate dehydogenase was identified as the third 
best candidate, and although it was not used for normal- 
ization, it was identified as non-significantly differen- 
tially expressed element in the microarray results. 

Normalized relative quantities were tested for normal- 
ity through an Anderson-Darling test (Minitab 16). Not 
all GOIs were found to display normally distributed 
expression ratios, and therefore a non-parametric, one- 
tailed Mann- Whitney U test was used to determine sig- 
nificance of fold change between the groups (a one- 
tailed test was selected as the directionality was 
expected from microarray results). 

Additional material 



Additional file 1: Update of six gene annotations that were 
differentially expressed. Updated gene annotations for the six gene 
descriptions that differed substantially from the original annotation file 
and our use of megaBLAST and tBLASTx. 

Additional file 2: Comparison of fold change results between 
microarray and RT-qPCR assays. Fold change values comparison 
between microarray and RT-qPCR assays. Bold values indicate statistically 
significantly different from 1. The full descriptions of the genes are: 72 
kDa type IV collagenase precursor (Genbank:CB510651), troponin I, slow 
skeletal muscle (Genbank:CB510901), single-stranded DNA-binding 
protein, mitochondrial precursor (Genbank:CA062007), and malate 
dehydrogenase (GenbankCA044864). 

Additional file 3: Tecan HS 400 Pro hybridization protocol Detailed 
description of all microarray hybridization steps. 
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